116 research outputs found
DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual Information for Real-time Semantic Segmentation
Many current works directly adopt multi-rate depth-wise dilated convolutions
to capture multi-scale contextual information simultaneously from one input
feature map, thus improving the feature extraction efficiency for real-time
semantic segmentation. However, this design may lead to difficult access to
multi-scale contextual information because of the unreasonable structure and
hyperparameters. To lower the difficulty of drawing multi-scale contextual
information, we propose a highly efficient multi-scale feature extraction
method, which decomposes the original single-step method into two steps, Region
Residualization-Semantic Residualization. In this method, the multi-rate
depth-wise dilated convolutions take a simpler role in feature extraction:
performing simple semantic-based morphological filtering with one desired
receptive field in the second step based on each concise feature map of region
form provided by the first step, to improve their efficiency. Moreover, the
dilation rates and the capacity of dilated convolutions for each network stage
are elaborated to fully utilize all the feature maps of region form that can be
achieved.Accordingly, we design a novel Dilation-wise Residual (DWR) module and
a Simple Inverted Residual (SIR) module for the high and low level network,
respectively, and form a powerful DWR Segmentation (DWRSeg) network. Extensive
experiments on the Cityscapes and CamVid datasets demonstrate the effectiveness
of our method by achieving a state-of-the-art trade-off between accuracy and
inference speed, in addition to being lighter weight. Without pretraining or
resorting to any training trick, we achieve an mIoU of 72.7% on the Cityscapes
test set at a speed of 319.5 FPS on one NVIDIA GeForce GTX 1080 Ti card, which
exceeds the latest methods of a speed of 69.5 FPS and 0.8% mIoU. The code and
trained models are publicly available
A Long-Tail Friendly Representation Framework for Artist and Music Similarity
The investigation of the similarity between artists and music is crucial in
music retrieval and recommendation, and addressing the challenge of the
long-tail phenomenon is increasingly important. This paper proposes a Long-Tail
Friendly Representation Framework (LTFRF) that utilizes neural networks to
model the similarity relationship. Our approach integrates music, user,
metadata, and relationship data into a unified metric learning framework, and
employs a meta-consistency relationship as a regular term to introduce the
Multi-Relationship Loss. Compared to the Graph Neural Network (GNN), our
proposed framework improves the representation performance in long-tail
scenarios, which are characterized by sparse relationships between artists and
music. We conduct experiments and analysis on the AllMusic dataset, and the
results demonstrate that our framework provides a favorable generalization of
artist and music representation. Specifically, on similar artist/music
recommendation tasks, the LTFRF outperforms the baseline by 9.69%/19.42% in Hit
Ratio@10, and in long-tail cases, the framework achieves 11.05%/14.14% higher
than the baseline in Consistent@10
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
In unsupervised domain adaptation (UDA), a model trained on source data (e.g.
synthetic) is adapted to target data (e.g. real-world) without access to target
annotation. Most previous UDA methods struggle with classes that have a similar
visual appearance on the target domain as no ground truth is available to learn
the slight appearance differences. To address this problem, we propose a Masked
Image Consistency (MIC) module to enhance UDA by learning spatial context
relations of the target domain as additional clues for robust visual
recognition. MIC enforces the consistency between predictions of masked target
images, where random patches are withheld, and pseudo-labels that are generated
based on the complete image by an exponential moving average teacher. To
minimize the consistency loss, the network has to learn to infer the
predictions of the masked regions from their context. Due to its simple and
universal concept, MIC can be integrated into various UDA methods across
different visual recognition tasks such as image classification, semantic
segmentation, and object detection. MIC significantly improves the
state-of-the-art performance across the different recognition tasks for
synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For
instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8%
on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an
improvement of +2.1 and +3.0 percent points over the previous state of the art.
The implementation is available at https://github.com/lhoyer/MIC.Comment: CVPR 202
voxel2vec: A Natural Language Processing Approach to Learning Distributed Representations for Scientific Data
Relationships in scientific data, such as the numerical and spatial
distribution relations of features in univariate data, the scalar-value
combinations' relations in multivariate data, and the association of volumes in
time-varying and ensemble data, are intricate and complex. This paper presents
voxel2vec, a novel unsupervised representation learning model, which is used to
learn distributed representations of scalar values/scalar-value combinations in
a low-dimensional vector space. Its basic assumption is that if two scalar
values/scalar-value combinations have similar contexts, they usually have high
similarity in terms of features. By representing scalar values/scalar-value
combinations as symbols, voxel2vec learns the similarity between them in the
context of spatial distribution and then allows us to explore the overall
association between volumes by transfer prediction. We demonstrate the
usefulness and effectiveness of voxel2vec by comparing it with the isosurface
similarity map of univariate data and applying the learned distributed
representations to feature classification for multivariate data and to
association analysis for time-varying and ensemble data.Comment: Accepted by IEEE Transaction on Visualization and Computer Graphics
(TVCG
Inference of nonlinear causal effects with GWAS summary data
Large-scale genome-wide association studies (GWAS) have offered an exciting
opportunity to discover putative causal genes or risk factors associated with
diseases by using SNPs as instrumental variables (IVs). However, conventional
approaches assume linear causal relations partly for simplicity and partly for
the only availability of GWAS summary data. In this work, we propose a novel
model {for transcriptome-wide association studies (TWAS)} to incorporate
nonlinear relationships across IVs, an exposure, and an outcome, which is
robust against violations of the valid IV assumptions and permits the use of
GWAS summary data. We decouple the estimation of a marginal causal effect and a
nonlinear transformation, where the former is estimated via sliced inverse
regression and a sparse instrumental variable regression, and the latter is
estimated by a ratio-adjusted inverse regression. On this ground, we propose an
inferential procedure. An application of the proposed method to the ADNI gene
expression data and the IGAP GWAS summary data identifies 18 causal genes
associated with Alzheimer's disease, including APOE and TOMM40, in addition to
7 other genes missed by two-stage least squares considering only linear
relationships. Our findings suggest that nonlinear modeling is required to
unleash the power of IV regression for identifying potentially nonlinear
gene-trait associations. Accompanying this paper is our Python library
nl-causal(https://github.com/nl-causal/nonlinear-causal) that implements the
proposed method.Comment: 36 pages, 8 figure
Registration-Free Hybrid Learning Empowers Simple Multimodal Imaging System for High-quality Fusion Detection
Multimodal fusion detection always places high demands on the imaging system
and image pre-processing, while either a high-quality pre-registration system
or image registration processing is costly. Unfortunately, the existing fusion
methods are designed for registered source images, and the fusion of
inhomogeneous features, which denotes a pair of features at the same spatial
location that expresses different semantic information, cannot achieve
satisfactory performance via these methods. As a result, we propose IA-VFDnet,
a CNN-Transformer hybrid learning framework with a unified high-quality
multimodal feature matching module (AKM) and a fusion module (WDAF), in which
AKM and DWDAF work in synergy to perform high-quality infrared-aware visible
fusion detection, which can be applied to smoke and wildfire detection.
Furthermore, experiments on the M3FD dataset validate the superiority of the
proposed method, with IA-VFDnet achieving the best detection performance than
other state-of-the-art methods under conventional registered conditions. In
addition, the first unregistered multimodal smoke and wildfire detection
benchmark is openly available in this letter
Direct observation of magnon-phonon coupling in yttrium iron garnet
The magnetic insulator yttrium iron garnet (YIG) with a ferrimagnetic
transition temperature of 560 K has been widely used in microwave and
spintronic devices. Anomalous features in the spin Seeback effect (SSE)
voltages have been observed in Pt/YIG and attributed to the magnon-phonon
coupling. Here we use inelastic neutron scattering to map out low-energy spin
waves and acoustic phonons of YIG at 100 K as a function of increasing magnetic
field. By comparing the zero and 9.1 T data, we find that instead of splitting
and opening up gaps at the spin wave and acoustic phonon dispersion
intersecting points, magnon-phonon coupling in YIG enhances the hybridized
scattering intensity. These results are different from expectations of
conventional spin-lattice coupling, calling for new paradigms to understand the
scattering process of magnon-phonon interactions and the resulting
magnon-polarons.Comment: 5 pages, 4 figures, PRB in pres
Application of the fuzzy optimal model in the selection of the startup hub
This paper integrates nominal group technique (NGT), analytical hierarchy process (AHP), and fuzzy technique for order preference by similarity to an ideal solution (TOPSIS) approach, and a case study has been used to demonstrate the fuzzy optimal selection model. From a literature review on the startup hub and the interviews conducted with officials and experts, the selection criteria are (1) convenience - promoted by the city's entrepreneurial policies or its traffic infrastructure; (2) potentiality - promoted by a regional network or value chain of startups. Lastly, the best idle land resulted in this case study with equal decision-making power using the fuzzy method is Taipei Jianguo Brewery, and the difference of decision-making power might make the best idle land to be Wanbao Textile Factory
AI-Driven Patient Monitoring with Multi-Agent Deep Reinforcement Learning
Effective patient monitoring is vital for timely interventions and improved
healthcare outcomes. Traditional monitoring systems often struggle to handle
complex, dynamic environments with fluctuating vital signs, leading to delays
in identifying critical conditions. To address this challenge, we propose a
novel AI-driven patient monitoring framework using multi-agent deep
reinforcement learning (DRL). Our approach deploys multiple learning agents,
each dedicated to monitoring a specific physiological feature, such as heart
rate, respiration, and temperature. These agents interact with a generic
healthcare monitoring environment, learn the patients' behavior patterns, and
make informed decisions to alert the corresponding Medical Emergency Teams
(METs) based on the level of emergency estimated. In this study, we evaluate
the performance of the proposed multi-agent DRL framework using real-world
physiological and motion data from two datasets: PPG-DaLiA and WESAD. We
compare the results with several baseline models, including Q-Learning, PPO,
Actor-Critic, Double DQN, and DDPG, as well as monitoring frameworks like
WISEML and CA-MAQL. Our experiments demonstrate that the proposed DRL approach
outperforms all other baseline models, achieving more accurate monitoring of
patient's vital signs. Furthermore, we conduct hyperparameter optimization to
fine-tune the learning process of each agent. By optimizing hyperparameters, we
enhance the learning rate and discount factor, thereby improving the agents'
overall performance in monitoring patient health status. Our AI-driven patient
monitoring system offers several advantages over traditional methods, including
the ability to handle complex and uncertain environments, adapt to varying
patient conditions, and make real-time decisions without external supervision.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessible. arXiv admin note: text overlap with arXiv:2309.1057
- …